Skip to content

core::simd introduction#21

Draft
loic-fejoz wants to merge 6 commits intoFutureSDR:mainfrom
loic-fejoz:feat/perf-convert-u8-ff
Draft

core::simd introduction#21
loic-fejoz wants to merge 6 commits intoFutureSDR:mainfrom
loic-fejoz:feat/perf-convert-u8-ff

Conversation

@loic-fejoz
Copy link
Copy Markdown
Collaborator

When available (feature gate), implementation will leverage core::simd to make it explicit how to vectorized loop.

Optimization of Deinterleave, TypeConverter, and FreqShift using std::simd and specialization, with significant performance gains (up to 4.4x). Also implementation is behind a simd feature gate for stable Rust compatibility and addressed CI/CD concerns by switching to min_specialization and allowing incomplete_features.

- Enable #![feature(portable_simd)] and #![feature(specialization)] (Nightly).
- Implement DeinterleaveSupported trait to provide specialized SIMD paths.
- Add SIMD-accelerated deinterleaving for f32, u8, i8, and i16 using a 8-lane configuration.
- Introduce deinterleave_scalar_logic for the base case and tail processing.
- Use a macro to reduce duplication across SIMD-supported types.
- Add comprehensive unit tests covering odd sample counts and multiple types.
- Add a Criterion benchmark to track deinterleaving performance (reaching ~1 Gelem/s).
- Update documentation in AGENTS.md and agent_docs/ to reflect the new performance pattern.
…e gate

- Implement SIMD-accelerated scaled conversion for u8 -> f32 (3x speedup).
- Add 'simd' feature gate to Cargo.toml to allow stable Rust compilation.
- Refactor Deinterleave and TypeConverter to use conditional compilation for SIMD.
- Add TypeConvertSupported trait for specialized type conversions.
- Add Criterion benchmark for type converters.
- Update SIMD.md with roadmap and established patterns.
- Implement FreqShiftSupported trait with specialized SIMD path for Complex32.
- Achieve ~4.4x speedup on complex rotation compared to naive scalar loop.
- Use vectorized complex multiplications with periodic NCO re-sync (every 1024 samples) to maintain numerical precision.
- Add Criterion benchmark for FreqShift.
- Update SIMD.md with performance results.
- Address clippy::clone_on_copy warning by dereferencing NCO.
- Switch from specialization to min_specialization for better stability.
- Add #[allow(incomplete_features)] to src/lib.rs to satisfy clippy in CI/CD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant